Add task.verify, dynamic Stripe keys, and capability warnings by amittell · Pull Request #3 · amittell/agentcli

amittell · 2026-03-30T18:15:41Z

Summary

task.verify (closes feat: add task.verify for post-completion outcome verification #2): New verify field on tasks runs a shell command after completion to confirm the expected outcome exists. Supports on_failure: error|warn and configurable timeout.
Dynamic Stripe key minting: key_strategy: "dynamic" in the stripe-api-key provider now mints restricted keys via the Stripe API with automatic cleanup. Includes scope-based permission mapping and expiry management.
Capability warnings: validateManifestCapabilities now returns { errors, warnings }. Soft warnings surface when manifests use runtime_identity_resolution or credential_handoff (downscope) against schedulers that don't advertise those features.
Trust boundary docs: Architecture and execution-identity docs updated with scheduler/child trust boundary and principal separation model.

Test plan

585 tests passing locally (19 verify + 45 dynamic Stripe + 6 capability + existing)
Lint clean
Verify task.verify: create manifest with verify.shell, execute, confirm pass/fail behavior
Verify dynamic Stripe: test with Stripe test-mode master key and restricted key minting
Verify capability warnings: apply manifest with provider identity to scheduler without runtime_identity_resolution

Summary by CodeRabbit

New Features
- Post-task "verify" step (configurable shell, timeout, on-failure); verify outcomes now affect execution result, audit, and returned outputs.
- Runtime downscoping with Stripe-backed dynamic restricted-key minting and best-effort revocation for child credential narrowing.
Documentation
- Scheduler–child trust boundary, credential control-flow, and execution principal separation guidance; verify field documented in field reference.
Improvements
- Capability validator now returns errors + advisory warnings; CLI surfaces warnings.
Tests
- Expanded coverage for verify behavior and dynamic key lifecycle.

…ing v0.2 follow-ups task.verify (agentcli#2): - New verify field on tasks: shell command runs after completion to confirm outcome - verify.shell (required), verify.timeout_seconds (default 30), verify.on_failure (error|warn) - Full pipeline: validation, schema, compiler, exec (v1+v2), scheduler fields - 19 new tests Dynamic Stripe key minting: - Implement key_strategy: "dynamic" in stripe-api-key provider - Mint restricted keys via POST /v1/api_keys with master key auth - Cleanup via DELETE /v1/api_keys/{id} (best-effort) - Dynamic prepareHandoff mints narrower keys for child scope - 45 new tests with mock HTTP server Capability warnings: - validateManifestCapabilities returns { errors, warnings } instead of flat array - Soft warnings for runtime_identity_resolution and credential_handoff (downscope) - Updated all callers (apply.js, cli.js, runtime adapter) - 6 new tests DRY dedup: already resolved (schedulerCreateSpec shared via apply.js import)

…ation Add architecture docs explaining the credential flow control surfaces, child credential policy enforcement model, and separation between agentcli (control plane) and scheduler (execution runtime) roles.

coderabbitai · 2026-03-30T18:15:50Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

Adds a task-level post-success "verify" step (schema, validation, compile, runtime), surfaces capability warnings alongside errors, implements Stripe-backed dynamic restricted API key minting/handoff with cleanup, and documents scheduler/child trust boundaries and execution/control-plane identity separation.

Changes

Cohort / File(s)	Summary
Documentation `docs/architecture.md`, `docs/execution-identity.md`, `docs/field-reference.md`	New trust-boundary and execution-principal sections; added workflow/task `verify` docs and "Task Verify Fields".
Capability validation & CLI `src/capabilities.js`, `src/apply.js`, `src/cli.js`, `src/runtime/openclaw-scheduler.js`	`validateManifestCapabilities` now returns `{ errors, warnings }`; added non-blocking warnings for missing runtime identity/credential handoff; call sites emit and surface warnings.
Verify schema & validation `src/schema.js`, `src/validate.js`	Added reusable `verify` schema; allow `verify` on workflows/tasks; implemented `validateVerify` enforcing `shell` (required string), `timeout_seconds` ≥1, `on_failure` ∈ {error,warn}.
Verify resolution & compilation `src/compiler/shared.js`, `src/compiler/openclaw-scheduler.js`, `src/scheduler-fields.js`	Exported `resolveVerify(workflow, task)` with precedence/defaults; compiled jobs include `verify_shell`, `verify_timeout_s`, `verify_on_failure`; added fields to `SCHEDULER_FIELDS_V02`.
Verify runtime & auditing `src/exec.js`	Threaded `verify` into execution paths; added `runVerify` (sync shell exec with timeout); integrated `verify.on_failure` semantics, included `verify` results in outputs/audits, adjusted effective success and audit-on-failure, and refined cleanup conditions for provider artifacts.
Stripe dynamic key strategy `src/identity/stripe-api-key.js`	Implemented `dynamic` key_strategy: master-key resolution, permission mapping/encoding, Stripe create/delete helpers, minting of restricted keys for materialization and handoff, session embedding of minted key, and best-effort cleanup; exported internals for tests.
Schema projection & scheduler fields `src/schema.js`, `src/scheduler-fields.js`	Extended manifest/schema and scheduler projection to include `verify` and scheduler-level `verify_*` fields.
Tests `test/agentcli.test.js`	Updated tests for new capability return shape and warnings; added extensive `verify` validation/compile/exec tests and comprehensive Stripe dynamic key minting/handoff/cleanup tests (includes mock HTTP server).
Minor wiring `src/apply.js`, `src/runtime/openclaw-scheduler.js` (small changes)	Stored capability warnings and adjusted destructuring at runtime adapter call site.

Sequence Diagram(s)

sequenceDiagram
    participant Scheduler as Scheduler
    participant Executor as Task Runner
    participant Verify as Verify Runner
    participant Audit as Audit Logger
    participant Stripe as Stripe API

    Scheduler->>Stripe: (if dynamic) POST create restricted key (mint)
    Stripe-->>Scheduler: return key id/secret/expires
    Scheduler->>Executor: dispatch compiled job (includes verify, materialization/handoff metadata, narrowed creds)
    Executor->>Executor: run main task command (spawn)
    Executor->>Executor: capture exit_code, stdout, stderr

    alt exit_code == 0 and verify present
        Executor->>Verify: runVerify(shell, timeout_seconds)
        Verify->>Verify: spawnSync('sh', ['-c', ...]) with timeout
        Verify-->>Executor: {passed, exit_code, stdout, stderr, timed_out, duration_ms}

        alt verify.passed or verify.on_failure == 'warn'
            Executor->>Audit: append verify result (warning if applicable)
        else
            Executor->>Executor: mark verify_failed, effectiveOk=false
            Executor->>Audit: emit failure audit (includes verify result)
            Executor->>Scheduler: propagate verify_failed error
        end
    else
        Executor->>Audit: log main execution result (no verify)
    end

    opt cleanup/handoff for dynamic keys
        Scheduler->>Stripe: DELETE restricted key (best-effort revoke)
    end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

Poem

🐰 I hopped through schemas, shells, and keys,

I checked the run and salted logs with ease,
I minted tiny tokens, scoped so neat,
I ran a verify, then tucked results to keep,
A rabbit cheers these changes — nibble a carrot please 🥕

Note

🎁 Summarized by CodeRabbit Free

Your organization is on the Free plan. CodeRabbit will generate a high-level summary and a walkthrough for each pull request. For a comprehensive line-by-line review, please upgrade your subscription to CodeRabbit Pro by visiting https://app.coderabbit.ai/login.

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Copilot

Pull request overview

Adds post-run verification for tasks, implements dynamic Stripe restricted-key minting with cleanup, and enhances capability checking to return both hard errors and soft warnings (plus updated trust-boundary documentation).

Changes:

Introduce task.verify / workflow.verify validation, compilation to scheduler fields, and execution-time verify behavior.
Implement stripe-api-key provider key_strategy: "dynamic" with Stripe API mint/revoke helpers and extensive tests.
Update capability validation to return { errors, warnings } and surface warnings in CLI/apply paths; expand architecture/security docs.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
test/agentcli.test.js	Updates capability tests to new return shape and adds extensive dynamic Stripe + verify test coverage.
src/validate.js	Adds `verify` to known keys and validates `verify` blocks for workflows/tasks.
src/schema.js	Extends manifest schema with `verify` fields and adds scheduler job verify columns.
src/scheduler-fields.js	Adds verify fields to v0.2 scheduler field list.
src/runtime/openclaw-scheduler.js	Adapts to `{ errors, warnings }` return shape (uses errors).
src/identity/stripe-api-key.js	Implements dynamic key minting + cleanup, adds helpers/constants, exports for tests.
src/exec.js	Adds verify execution phase and returns/audits verify results; verify failures can flip `ok` / throw.
src/compiler/shared.js	Adds `resolveVerify()` and includes `verify` in normalized task plans.
src/compiler/openclaw-scheduler.js	Compiles verify settings to scheduler job fields.
src/cli.js	Surfaces capability warnings in `--check-capabilities` output.
src/capabilities.js	Changes API to return `{ errors, warnings }` and adds new warning checks.
src/apply.js	Prints capability warnings to stderr during apply while still gating on errors.
docs/execution-identity.md	Adds security-model section describing principal separation/trust boundary.
docs/architecture.md	Documents scheduler/child trust boundary and credential strategy model.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…ey guard - Embed session in materialize result so cleanup() can access stripe_key_id without callers needing to thread session through ctx - Restrict http:// api_base to localhost unless allow_insecure_http is set, preventing accidental credential leak over plain HTTP - Add min: 1 to verify_timeout_s in published schema - Guard resolveMasterKey against missing/non-object master_key_source

amittell · 2026-03-30T19:17:30Z

Follow-up review pass pushed in e0e1e46.

Addressed two execution-path regressions in this PR:

task.verify now runs in the effective task cwd/env, so shell-local env vars and v0.2 materialized credentials are visible to the verify step.
Dynamic Stripe sessions now revoke minted restricted keys from exec, including dry-run handoff preparation, and cleanup is awaited so warnings/results are deterministic.

Also wired task timeout into dynamic session resolution and documented the new workflow/task verify block in the field reference.

Local verification: npm test, npm run lint.

amittell · 2026-03-30T19:17:40Z

Superseded by the later follow-up comments on this PR. The previous body was malformed by shell escaping; the current review summary is in the newer owner comments and review notes.

amittell

Validation

Ran npm test locally on the PR head: all tests passed (full node --test suite including new coverage for task.verify, Stripe dynamic keys, and capability warnings).

Summary

Solid, cohesive v0.2 follow-up: post-exec verify, dynamic Stripe restricted-key minting with cleanup and HTTP hardening, capability negotiation split into hard errors vs soft warnings, plus helpful architecture / trust-boundary docs. Tests are extensive and match the behavioral contracts described in commit messages.

Strengths

Verify pipeline is wired end-to-end: validation, schema (min: 1 on timeout), resolveVerify merge (task overrides workflow), compiler → scheduler fields, and both v0.1 / v0.2 exec paths with correct cwd/env (including materialized identity env in v0.2).
Stripe dynamic: master key resolution is guarded; api_base blocks non-local http: unless opted in; mint/delete paths are tested with a mock HTTP server; materialization embeds session so cleanup() can revoke keys without threading extra context.
Capabilities: returning { errors, warnings } is a clean upgrade; --check-capabilities exposes structured warnings for automation.
Docs clearly separate agentcli control plane vs scheduler runtime — good for security reviews.

Follow-ups (non-blocking; also left inline)

Evidence vs verify (v0.2): Evidence is attested before the verify shell runs, so envelopes can show success even when verify later fails. Document or adjust if evidence is meant to cover “verify included” outcomes.
apply --json: Capability soft-warnings only hit stderr; consider surfacing them in the JSON payload like --check-capabilities does.
Stripe: Update stale resolveSession JSDoc that still says dynamic is unimplemented.
Capabilities: Handoff vs downscope warning/error asymmetry is intentional but subtle — worth a sentence in field-reference.

Overall: No test regressions found; approve-worthy from a correctness/CI perspective with the above documentation/product clarifications as polish.

…c, capability cross-link - Document evidence vs verify ordering in exec.js (evidence attests command outcome, verify is a separate operator-local deliverable check) - Include capability warnings in apply JSON output (capabilities.warnings) so automation parsing stdout gets structured warning data - Fix stale JSDoc claiming dynamic strategy is not implemented - Add cross-link comment explaining hard error (presentation.handoff) vs soft warning (child_credential_policy downscope) asymmetry

amittell · 2026-03-30T21:30:45Z

One more follow-up commit is on the PR branch: e8a5808.

I found a second-order contract issue in the new verify path: the main task respected strict sandbox/network enforcement, but the post-success verify command was still shelling out directly. That meant verify could bypass the task contract on supported local sandbox runners.

This is now fixed by running verify through the same sandbox wrapper/profile when strict enforcement is active, with a regression test covering the darwin sandboxed path.

Validation rerun after the update: npm test, npm run lint.

amittell · 2026-03-31T01:46:33Z

Polish follow-up pushed in 160f856.

This pass does not change runtime behavior. It tightens the review surface around the current PR head:

added a regression proving applyManifestToScheduler() keeps capability warnings in the structured result while still emitting the human-readable stderr advisory
documented that task.verify runs after evidence generation and is recorded separately from the attested command result
documented why identity.presentation.handoff is a hard compatibility gate while child_credential_policy="downscope" is only a warning when credential_handoff is unavailable
cleaned up the malformed duplicate owner comment and resolved the remaining stale non-blocking review threads

Validation rerun on this head: npm test, npm run lint.

amittell · 2026-04-01T05:26:47Z

Post-merge assessment on PR #3 (160f8564d7e8409be3aec93c03185c06d0613260) plus current main:

I did not find any remaining blocking issues in the merged PR changes themselves. The task.verify, dynamic Stripe key, and capability-warning work all looks coherent after the earlier follow-up docs/tests.
I did find one current-main runtime gap adjacent to this PR: delegated exec handoff to openclaw-scheduler was enforcing hard capability errors but dropping soft capability warnings from the structured delegation receipt. That made exec inconsistent with apply --check-capabilities / apply for the same manifest shape.
I also found that examples/ansible-ops.json had drifted out of schema (contract.audit: "full" instead of none | on-failure | always), which was breaking the repo’s all example manifests validate and compile test on clean main.

I pushed a small follow-up branch with both fixes:

branch: followup/delegation-capability-warnings
commit: a77630d
compare: main...followup/delegation-capability-warnings

Validation on that branch:

npm test: 592/592 passing
npm run lint: passing

The new regression test covers delegated exec returning warnings / capability_warnings when the scheduler lacks runtime_identity_resolution, and the example repair restores the green full-suite baseline on main.

amittell added 2 commits March 30, 2026 12:22

Document scheduler/child trust boundary and execution principal separ…

8df7a18

…ation Add architecture docs explaining the credential flow control surfaces, child credential policy enforcement model, and separation between agentcli (control plane) and scheduler (execution runtime) roles.

Copilot AI review requested due to automatic review settings March 30, 2026 18:15

amittell mentioned this pull request Mar 30, 2026

Add x-openclaw-env-inject header for agent task credentials amittell/openclaw-scheduler#5

Merged

6 tasks

Copilot started reviewing on behalf of amittell March 30, 2026 18:16 View session

Copilot AI reviewed Mar 30, 2026

View reviewed changes

Comment thread src/identity/stripe-api-key.js

Comment thread src/identity/stripe-api-key.js Outdated

Comment thread src/schema.js Outdated

Comment thread src/identity/stripe-api-key.js

amittell added 2 commits March 30, 2026 14:39

Fix verify context and dynamic Stripe cleanup

e0e1e46

amittell commented Mar 30, 2026

View reviewed changes

Comment thread src/exec.js

Comment thread src/apply.js

Comment thread src/identity/stripe-api-key.js

Comment thread src/capabilities.js

amittell added 2 commits March 30, 2026 16:17

Sandbox verify commands under strict contracts

e8a5808

Clarify verify and capability warning semantics

160f856

amittell merged commit 435ef2e into main Mar 31, 2026
2 checks passed

amittell deleted the feature/remaining-v02-tasks branch March 31, 2026 02:17

Conversation

amittell commented Mar 30, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amittell commented Mar 30, 2026

Uh oh!

amittell commented Mar 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

amittell left a comment

Choose a reason for hiding this comment

Validation

Summary

Strengths

Follow-ups (non-blocking; also left inline)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

amittell commented Mar 30, 2026

Uh oh!

amittell commented Mar 31, 2026

Uh oh!

Uh oh!

amittell commented Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

amittell commented Mar 30, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Mar 30, 2026 •

edited

Loading

amittell commented Mar 30, 2026 •

edited

Loading